Graph Clustering for Keyword Search
نویسندگان
چکیده
Keyword search on data represented as graphs, is receiving lot of attention in recent years. Initial versions of keyword search systems assumed that the graph is memory resident. However, there are applications where the graph can be much larger than the available memory. This led to the development of search algorithms which search on a smaller memory resident summary graph (supernode graph), and fetch parts of the original graph from the disk, only when required. In this scenario, good clustering of nodes into supernodes, when constructing the summary graph, is a key to efficient search. In this paper, we address the issue of graph clustering for keyword search, using a technique based on random walks. We propose an algorithm, which we call Modified Nibble clustering algorithm, that improves upon the Nibble algorithm proposed earlier. We outline several policies that can improve its performance. Then, we compare our algorithm with two graph clustering algorithms proposed earlier, EBFS and kMetis. Our performance metrics include edge compression, keyword search performance, and the time and space overheads for clustering. Our results show that Modified Nibble outperforms EBFS uniformly, and outperforms kMetis in some settings. Further, the memory requirements of our algorithm are much lower than that of kMetis, making it practical even with a very large number of nodes, unlike kMetis.
منابع مشابه
An Effective Path-aware Approach for Keyword Search over Data Graphs
Abstract—Keyword Search is known as a user-friendly alternative for structured languages to retrieve information from graph-structured data. Efficient retrieving of relevant answers to a keyword query and effective ranking of these answers according to their relevance are two main challenges in the keyword search over graph-structured data. In this paper, a novel scoring function is proposed, w...
متن کاملExploiting Semantic Result Clustering to Support Keyword Search on Linked Data
Keyword search is by far the most popular technique for searching linked data on the web. The simplicity of keyword search on data graphs comes with at least two drawbacks: difficulty in identifying results relevant to the user intent among an overwhelming number of candidates and performance scalability problems. In this paper, we claim that result ranking and top-k processing which adapt sche...
متن کاملA partition-based algorithm for clustering large-scale software systems
Clustering techniques are used to extract the structure of software for understanding, maintaining, and refactoring. In the literature, most of the proposed approaches for software clustering are divided into hierarchical algorithms and search-based techniques. In the former, clustering is a process of merging (splitting) similar (non-similar) clusters. These techniques suffered from the drawba...
متن کاملFinding Community Base on Web Graph Clustering
Search Pointers organize the main part of the application on the Internet. However, because of Information management hardware, high volume of data and word similarities in different fields the most answers to the user s’ questions aren`t correct. So the web graph clustering and cluster placement in corresponding answers helps user to achieve his or her intended results. Community (web communit...
متن کاملKeyword Generation for Lyrics
This paper proposes a scheme for content based keyword generation of song lyrics. Syntactic as well semantic similarity is used for sentence level clustering to separate the topic from the background of a song. A method is proposed to search for a center in the semantic graph ofWordNet for generating keywords not contained in original text.
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2009